Apparatus and a Method for Decoding an Encoded Audio Signal

ABSTRACT

An apparatus for decoding an encoded audio signal having first and second portions encoded in accordance with first and second encoding algorithms, respectively, BWE parameters for the first and second portions and a coding mode information indicating a first or a second decoding algorithm, includes first and second decoders, a BWE module and a controller. The decoders decode portions in accordance with decoding algorithms for time portions of the encoded signal to obtain decoded signals. The BWE module has a controllable crossover frequency and is configured for performing a bandwidth extension algorithm using the first decoded signal and the BWE parameters for the first portion, and for performing a bandwidth extension algorithm using the second decoded signal and the bandwidth extension parameter for the second portion. The controller controls the crossover frequency for the BWE module in accordance with the coding mode information.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of copending International PatentApplication No. PCT/EP2009/004522 filed Jun. 23, 2009, and claimspriority to U.S. Application No. 61/079,841, filed Jul. 11, 2008, andadditionally claims priority from U.S. Application 61/103,820, filedAug. 10, 2008, all of which are incorporated herein by reference intheir entirety.

BACKGROUND OF THE INVENTION

The present invention relates to an apparatus and a method for decodingan encoded audio signal, an apparatus for encoding, a method forencoding and an audio signal.

In the art, frequency domain coding schemes such as MP3 or AAC areknown. These frequency-domain encoders are based on atime-domain/frequency-domain conversion, a subsequent quantizationstage, in which the quantization error is controlled using informationfrom a psychoacoustic module, and an encoding stage, in which thequantized spectral coefficients and corresponding side information areentropy-encoded using code tables.

On the other hand there are encoders that are very well suited to speechprocessing such as the AMR-WB+ as described in 3GPP TS 26.290. Suchspeech coding schemes perform a Linear Predictive filtering of atime-domain signal. Such a LP filtering is derived from a LinearPrediction analysis of the input time-domain signal. The resulting LPfilter coefficients are then coded and transmitted as side information.The process is known as Linear Prediction Coding (LPC). At the output ofthe filter, the prediction residual signal or prediction error signalwhich is also known as the excitation signal is encoded using theanalysis-by-synthesis stages of the ACELP encoder or, alternatively, isencoded using a transform encoder which uses a Fourier transform with anoverlap. The decision between the ACELP coding and the Transform CodedeXcitation coding which is also called TCX coding is done using a closedloop or an open loop algorithm.

Frequency-domain audio coding schemes such as the high efficiency-AACencoding scheme which combines an AAC coding scheme and a spectralbandwidth replication technique, can also be combined to a joint stereoor a multi-channel coding tool which is known under the term “MPEGsurround”. On the other hand, speech encoders such as the AMR-WB+ alsohave a high frequency enhancement stage and a stereo functionality.

Said spectral band replication (SBR) comprises a technique that gainedpopularity as an add-on to popular perception audio coded such as MP3and the advanced audio coding (AAC). SBR comprise a method of bandwidthextension (BWE) in which the low band (base band or core band) of thespectrum is encoded using an existing coding, whereas as the upper band(or high band) is coarsely parameterized using fewer parameters. SBRmakes use of a correlation between the low band and the high band inorder to predict the high band signal from extracting lower bandfeatures.

SBR is, for example, used in HE-AAC or AAC+SBR. In SBR it is possible todynamically change the crossover frequency (BWE start frequency) as wellas the temporal resolution meaning the number of parameter sets(envelopes) per frame. AMR-WB+ implements a time domain bandwidthextension in combination with a switched time/frequency domain corecoder, giving good audio quality especially for speech signals. Alimiting factor to AMR-WB+ audio quality is the audio bandwidth commonto both core codecs and BWE start frequency that is one quarter of thesystem's internal sampling frequency. While the ACELP speech model iscapable to model speech signals quite well over the full bandwidth, thefrequency domain audio coder fails to deliver decent quality for somegeneral audio signals. Thus, speech coding schemes show a high qualityfor speech signals even at low bit rates, but show a poor quality formusic signals at low bit rates.

Frequency-domain coding schemes such as HE-AAC are advantageous in thatthey show a high quality at low bit rates for music signals.Problematic, however, is the quality of speech signals at low bit rates.

Therefore, different classes of audio signal demand differentcharacteristics of bandwidth extension tool.

SUMMARY

According to an embodiment, an apparatus for decoding an encoded audiosignal, the encoded audio signal having a first portion encoded inaccordance with a first encoding algorithm, a second portion encoded inaccordance with a second encoding algorithm, BWE parameters for thefirst portion and the second portion and a coding mode informationindicating a first decoding algorithm or a second decoding algorithm,may have: a first decoder for decoding the first portion in accordancewith the first decoding algorithm for a first time portion of theencoded signal to acquire a first decoded signal, wherein the firstdecoder has an LPC-based coder; a second decoder for decoding the secondportion in accordance with the second decoding algorithm for a secondtime portion of the encoded signal to acquire a second decoded signal,wherein the second decoder has a transform-based coder; a BWE modulehaving a controllable crossover frequency, the BWE module beingconfigured for performing a bandwidth extension algorithm using thefirst decoded signal and the BWE parameters for the first portion, andfor performing a bandwidth extension algorithm using the second decodedsignal and the bandwidth extension parameter for the second portion,wherein the BWE module is configured to use a first crossover frequencyfor the bandwidth extension for the first decoded signal and to use asecond crossover frequency for the bandwidth extension for the seconddecoded signal, wherein the first crossover frequency is higher than thesecond crossover frequency; and a controller for controlling thecrossover frequency for the BWE module in accordance with the codingmode information.

According to another embodiment, an apparatus for encoding an audiosignal may have: a first encoder which is configured to encode inaccordance with a first encoding algorithm, the first encoding algorithmhaving a first frequency bandwidth, wherein the first encoder has anLPC-based coder; a second encoder which is configured to encode inaccordance with a second encoding algorithm, the second encodingalgorithm having a second frequency bandwidth being smaller than thefirst frequency bandwidth, wherein the second encoder has atransform-based coder; a decision stage for indicating the firstencoding algorithm for a first portion of the audio signal and forindicating the second encoding algorithm for a second portion of theaudio signal, the second portion being different from the first portion;and a bandwidth extension module for calculating BWE parameters for theaudio signal, wherein the BWE module is configured to be controlled bythe decision stage to calculate the BWE parameters for a band not havingthe first frequency bandwidth in the first portion of the audio signaland for a band not having the second frequency bandwidth in the secondportion of the audio signal, wherein the first or the second frequencybandwidth is defined by a variable crossover frequency and wherein thedecision stage is configured to output the variable crossover frequency,wherein the BWE module is configured to use a first crossover frequencyfor calculating the BWE parameters for a signal encoded using the firstencoder and to use a second crossover frequency for a signal encodedusing the second encoder, wherein the first crossover frequency ishigher than the second crossover frequency.

According to another embodiment, a method for decoding an encoded audiosignal, the encoded audio signal having a first portion encoded inaccordance with a first encoding algorithm, a second portion encoded inaccordance with a second encoding algorithm, BWE parameters for thefirst portion and the second portion and a coding mode informationindicating a first decoding algorithm or a second decoding algorithm,may have the steps of: decoding the first portion in accordance with thefirst decoding algorithm for a first time portion of the encoded signalto acquire a first decoded signal, wherein decoding the first portionincludes using an LPC-based coder; decoding the second portion inaccordance with the second decoding algorithm for a second time portionof the encoded signal to acquire a second decoded signal, whereindecoding the second portion includes using a transform-based coder;performing a bandwidth extension algorithm by a BWE module including acontrollable crossover frequency, using the first decoded signal and theBWE parameters for the first portion, and performing, by the BWE modulehaving the controllable crossover frequency, a bandwidth extensionalgorithm using the second decoded signal and the bandwidth extensionparameter for the second portion, wherein a first crossover frequency isused for the bandwidth extension for the first decoded signal and asecond crossover frequency is used for the bandwidth extension for thesecond decoded signal, wherein the first crossover frequency is higherthan the second crossover frequency; and controlling the crossoverfrequency for the BWE module in accordance with the coding modeinformation.

According to another embodiment, a method for encoding an audio signalmay have the steps of: encoding in accordance with a first encodingalgorithm, the first encoding algorithm having a first frequencybandwidth, wherein encoding in accordance with a first encodingalgorithm includes using an LPC-based coder; encoding in accordance witha second encoding algorithm, the second encoding algorithm having asecond frequency bandwidth being smaller than the first frequencybandwidth, wherein encoding in accordance with a second encodingalgorithm includes using a transform-based coder; indicating the firstencoding algorithm for a first portion of the audio signal and thesecond encoding algorithm for a second portion of the audio signal, thesecond portion being different from the first portion; and calculatingBWE parameters for the audio signal such that the BWE parameters arecalculated for a band not having the first frequency bandwidth in thefirst portion of the audio signal and for a band not having the secondfrequency bandwidth in the second portion of the audio signal, whereinthe first or the second frequency bandwidth is defined by a variablecrossover frequency, wherein the BWE module is configured to use a firstcrossover frequency for calculating the BWE parameters for a signalencoded using the LPC-based coder and to use a second crossoverfrequency for a signal encoded using the transform-based coder, whereinthe first crossover frequency is higher than the second crossoverfrequency.

According to another embodiment, a encoded audio signal may have: afirst portion encoded in accordance with a first encoding algorithm, thefirst encoding algorithm having an LPC-based coder; a second portionencoded in accordance with a second different encoding algorithm, thesecond encoding algorithm having a transform-based coder; bandwidthextension parameters for the first portion and the second portion; and acoding mode information indicating a first crossover frequency used forthe first portion or a second crossover frequency used for the secondportion, wherein the first crossover frequency is higher than the secondcrossover frequency.

Another embodiment has a computer program for performing, when runningon a computer, the method for encoding an audio signal, which method mayhave the steps of: encoding in accordance with a first encodingalgorithm, the first encoding algorithm having a first frequencybandwidth, wherein encoding in accordance with a first encodingalgorithm includes using an LPC-based coder; encoding in accordance witha second encoding algorithm, the second encoding algorithm having asecond frequency bandwidth being smaller than the first frequencybandwidth, wherein encoding in accordance with a second encodingalgorithm includes using a transform-based coder; indicating the firstencoding algorithm for a first portion of the audio signal and thesecond encoding algorithm for a second portion of the audio signal, thesecond portion being different from the first portion; and calculatingBWE parameters for the audio signal such that the BWE parameters arecalculated for a band not having the first frequency bandwidth in thefirst portion of the audio signal and for a band not having the secondfrequency bandwidth in the second portion of the audio signal, whereinthe first or the second frequency bandwidth is defined by a variablecrossover frequency, wherein the BWE module is configured to use a firstcrossover frequency for calculating the BWE parameters for a signalencoded using the LPC-based coder and to use a second crossoverfrequency for a signal encoded using the transform-based coder, whereinthe first crossover frequency is higher than the second crossoverfrequency.

The present invention is based on the finding that the crossoverfrequency or the BWE start frequency is a parameter influencing theaudio quality. While time domain (speech) codecs usually code the wholefrequency range for a given sampling rate, audio bandwidth is a tuningparameter to transform-based coders (e.g. coders for music), asdecreasing the total number of spectral lines to encode will at the sametime increase the number of bits per spectral line available forencoding, meaning a quality versus audio bandwidth trade-off is made.Hence, in the new approach, different core coders with variable audiobandwidths are combined to a switched system with one common BWE module,wherein the BWE module has to account for the different audiobandwidths.

A straightforward way would be to find the lowest of all core coderbandwidths and use this as BWE start frequency, but this woulddeteriorate the perceived audio quality. Also, the coding efficiencywould be reduced, because in time sections where a core coder is activewhich has a higher bandwidth than the BWE start frequency, somefrequency regions would be represented twice, by the core coder as wellas the BWE which introduces redundancy. A better solution is thereforeto adapt the BWE start frequency to the audio bandwidth of the corecoder used.

Therefore according to embodiments of the present invention an audiocoding system combines a bandwidth extension tool with a signaldependent core coder (for example switched speech-/audio coder), whereinthe crossover frequency comprise a variable parameter. A signalclassifier output that controls the switching between different corecoding modes may also be used to switch the characteristics of the BWEsystem such as the temporal resolution and smearing, spectral resolutionand the crossover frequency.

Therefore, one aspect of the present invention is an audio decoder foran encoded audio signal, the encoded audio signal comprising a firstportion encoded in accordance with a first encoding algorithm, a secondportion encoded in accordance with a second encoding algorithm, BWEparameters for the first portion and the second portion and a codingmode information indicating a first decoding algorithm or a seconddecoding algorithm, comprising a first decoder, a second decoder, a BWEmodule and a controller. The first decoder decodes the first portion inaccordance with the first decoding algorithm for a first time portion ofthe encoded signal to obtain a first decoded signal. The second decoderdecodes the second portion in accordance with the second decodingalgorithm for a second time portion of the encoded signal to obtain asecond decoded signal. The BWE module has a controllable crossoverfrequency and is configured for performing a bandwidth extensionalgorithm using the first decoded signal and the BWE parameters for thefirst portion, and for performing a bandwidth extension algorithm usingthe second decoded signal and the bandwidth extension parameter for thesecond portion. The controller controls the crossover frequency for theBWE module in accordance with the coding mode information.

According to another aspect of the present invention, an apparatus forencoding an audio signal comprises a first and a second encoder, adecision stage and a BWE module. The first encoder is configured toencode in accordance with a first encoding algorithm, the first encodingalgorithm having a first frequency bandwidth. The second encoder isconfigured to encode in accordance with a second encoding algorithm, thesecond encoding algorithm having a second frequency bandwidth beingsmaller than the first frequency bandwidth. The decision stage indicatesthe first encoding algorithm for a first portion of the audio signal andthe second encoding algorithm for a second portion of the audio signal,the second portion being different from the first portion. The bandwidthextension module calculates BWE parameters for the audio signal, whereinthe BWE module is configured to be controlled by the decision stage tocalculate the BWE parameters for a band not including the firstfrequency bandwidth in the first portion of the audio signal and for aband not including the second frequency bandwidth in the second portionof the audio signal.

In contrast to embodiments, SBR in conventional technology is applied toa non-switch audio codec only which results in the followingdisadvantages. Both temporal resolution as well as crossover frequencycould be applied dynamically, but state of art implementations such as3GPP source apply usually only a change of temporary resolution fortransients as, for example, castanets. Furthermore, a finer overalltemporal resolution might be chosen at higher rates as a bit ratedependent tuning parameter. No explicit classification is carried outdetermining the temporal resolution or a decision threshold controllingthe temporal resolution, best matching the signal type as, for example,stationary, tonal music versus speech. Embodiments of the presentinvention overcome these disadvantages. Embodiments allow especially anadapted crossover frequency combined with a flexible choice for the usedcore coder so that the coded signal provides a significantly higherperceptual quality compared to encoder/decoder of conventionaltechnology.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will be detailed subsequentlyreferring to the appended drawings, in which:

FIG. 1 shows a block diagram of an apparatus for decoding in accordancewith a first aspect of the present invention;

FIG. 2 shows a block diagram of an apparatus for encoding in accordancewith the first aspect of the present invention;

FIG. 3 shows a block diagram of an encoding scheme in more details;

FIG. 4 shows a block diagram of a decoding scheme in more details;

FIG. 5 shows a block diagram of an encoding scheme in accordance with asecond aspect;

FIG. 6 is a schematic diagram of a decoding scheme in accordance withthe second aspect;

FIG. 7 illustrates an encoder-side LPC stage providing short-termprediction information and the prediction error signal;

FIG. 8 illustrates a further embodiment of an LPC device for generatinga weighted signal;

FIGS. 9 a-9 b show an encoder comprising an audio/speech-switchresulting in different temporal resolution for an audio signal; and

FIG. 10 illustrates a representation for an encoded audio signal.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 shows a decoder apparatus 100 for decoding an encoded audiosignal 102. The encoded audio signal 102 comprising a first portion 104a encoded in accordance with the first encoding algorithm, a secondportion 104 b encoded in accordance with a second encoding algorithm,BWE parameter 106 for the first time portion 104 a and the second timeportion 104 b and a coding mode information 108 indicating a firstdecoding algorithm or a second decoding algorithm for the respectivetime portions. The apparatus for decoding 100 comprises a first decoder110 a, a second decoder 110 b, a BWE module 130 and a controller 140.The first decoder 110 a is adapted to decode the first portion 104 a inaccordance with the first decoding algorithm for a first time portion ofthe encoded signal 102 to obtain a first decoded signal 114 a. Thesecond decoder 110 b is configured to decode the second portion 104 b inaccordance with the second decoding algorithm for a second time portionof the encoded signal to obtain a second decoded signal 114 b. The BWEmodule 130 has a controllable crossover frequency fx that adjusts thebehavior of the BWE module 130. The BWE module 130 is configured toperform a bandwidth extension algorithm to generate components of theaudio signal in the upper frequency band based on the first decodedsignal 114 a and the BWE parameters 106 for the first portion, and togenerate components of the audio signal in the upper frequency bandbased on the second decoded signal 114 b and the bandwidth extensionparameter 106 for the second portion. The controller 140 is configuredto control the crossover frequency fx of the BWE module 130 inaccordance with the coding mode information 108.

The BWE module 130 may comprise also a combiner combining the audiosignal components of lower and the upper frequency band and outputs theresulting audio signal 105.

The coding mode information 108 indicates, for example which timeportion of the encoded audio signal 102 is encoded by which encodingalgorithm. This information may at the same time identify the decoder tobe used for the different time portions. In addition, the coding modeinformation 108 may control a switch to switch between differentdecoders for different time portions.

Hence, the crossover frequency fx is an adjustable parameter which isadjusted in accordance with the used decoder which may, for example,comprise a speech coder as the first decoder 110 a and an audio decoderas the second decoder 110 b. As said above, the crossover frequency fxfor a speech decoder (as for example based on LPC) may be higher thanthe crossover frequency used for an audio decoder (e.g. for music).Thus, in further embodiments the controller 220 is configured toincrease the crossover frequency fx or to decrease the crossoverfrequency fx within one of the time portion (e.g. the second timeportion) so that the crossover frequency may be changed without changingthe decoding algorithm. This means that a change in the crossoverfrequency may not be related to a change in the used decoder: thecrossover frequency may be changed without changing the used decoder orvice versa the decoder may be changed without changing the crossoverfrequency.

The BWE module 130 may also comprise a switch which is controlled by thecontroller 140 and/or by the BWE parameter 106 so that the first decodedsignal 114 a is processed by the BWE module 130 during the first timeportion and the second decoded signal 114 b is processed by the BWEmodule 130 during the second time portion. This switch may be activatedby a change in the crossover frequency fx or by an explicit bit withinthe encoded audio signal 102 indicating the used encoding algorithmduring the respective time portion.

In further embodiments the switch is configured to switch between thefirst and second time portion from the first decoder to the seconddecoder so that the bandwidth extension algorithm is either applied tothe first decoded signal or to the second decoded signal. Alternatively,the bandwidth extension algorithm is applied to the first and/or tosecond decoded signal and the switch is placed after this so that one ofthe bandwidth extended signals is dropped.

FIG. 2 shows a block diagram for an apparatus 200 for encoding an audiosignal 105. The apparatus for encoding 200 comprises a first encoder 210a, a second encoder 210 b, a decision stage 220 and a bandwidthextension module (BWE module) 230. The first encoder 210 a is operativeto encode in accordance with a first encoding algorithm having a firstfrequency bandwidth. The second encoder 210 b is operative to encode inaccordance with a second encoding algorithm having a second frequencybandwidth being smaller than the first frequency bandwidth. The firstencoder may, for example, be a speech coder such as an LPC-based coder,whereas the second encoder 210 b may comprise an audio (music) encoder.The decision stage 220 is configured to indicate the first encodingalgorithm for a first portion 204 a of the audio signal 105 and toindicate the second encoding algorithm for a second portion 204 b of theaudio signal 105, wherein the second time portion being different fromthe first time portion. The first portion 204 a may correspond to afirst time portion and the second portion 204 b may correspond to asecond time portion which is different from the first time portion.

The BWE module 230 is configured to calculate BWE parameters 106 for theaudio signal 105 and is configured to be controlled by the decisionstage 220 to calculate the BWE parameter 106 for a first band notincluding the first frequency bandwidth in the first time portion 204 aof the audio signal 105. The BWE module 230 is further configured tocalculate the BWE parameter 106 for a second band not including thesecond bandwidth in the second time portion 204 b of the audio signal105. The first (second) band comprises hence frequency components of theaudio signal 105 which are outside the first (second) frequencybandwidth and are limited towards the lower end of the spectrum by thecrossover frequency fx. The first or the second bandwidth can thereforebe defined by a variable crossover frequency which is controlled by thedecision stage 220.

In addition, the BWE module 230 may comprise a switch controlled by thedecision stage 220. The decision stage 220 may determine an advantageouscoding algorithm for a given time portion and controls the switch sothat during the given time portion the advantageous coder is used. Themodified coding mode information 108′ comprises the corresponding switchsignal. Moreover, the BWE module 230 may also comprise a filter toobtain components of the audio signal 105 in the lower/upper frequencyband which are separated by the crossover frequency fx which maycomprise a value of about 4 kHz or 5 kHz. Finally the BWE module 130 mayalso comprise an analyzing tool to determine the BWE parameter 106. Themodified coding mode information 108′ may be equivalent (or equal) tothe coding mode information 108. The coding mode information 108indicates, for example, the used coding algorithm for the respectivetime portions in the bitstream of the encoded audio signal 105.

According to further embodiments, the decision stage 220 comprises asignal classifier tool which analyzes the original input signal 105 andgenerates the control information 108 which triggers the selection ofthe different coding modes. The analysis of the input signal 105 isimplementation dependent with the aim to choose the optimal core codingmode for a given input signal frame. The output of the signal classifiercan (optionally) also be used to influence the behavior of other tools,for example, MPEG surround, enhanced SBR, time-warped filterbank andothers. The input to the signal classifier tool comprises, for example,the original unmodified input signal 105, but also optionally additionalimplementation dependent parameters. The output of the signal classifiertool comprises the control signal 108 to control the selection of thecore codec (for example non-LP filtered frequency domain or LP filteredtime or frequency domain coding or further coding algorithms).

According to embodiments, the crossover frequency fx is adjusted signaldependent which is combined with the switching decision to use adifferent coding algorithm. Therefore, a simple switch signal may simplybe a change (a jump) in the crossover frequency fx. In addition, thecoding mode information 108 may also comprise the change of thecrossover frequency fx indicating at the same time an advantageouscoding scheme (e.g. speech/audio/music).

According to further embodiments the decision stage 220 is operative toanalyze the audio signal 105 or a first output of he first encoder 210 aor a second output of the second encoder 210 b or a signal obtained bydecoding an output signal of the encoder 210 a or the second encoder 210b with respect to a target function. The decision stage 220 mayoptionally be operative to perform a speech/music discrimination in sucha way that a decision to speech is favored with respect to a decision tomusic so that a decision to speech is taken, e.g., even when a portionless than 50% of a frame for the first switch is speech and a portionmore than 50% of the frame for the first switch is music. Therefore, thedecision stage 220 may comprise an analysis tool that analyses the audiosignal to decide whether the audio signal is mainly a speech signal ormainly a music signal so that based on the result the decision stage candecide which is the best codec to be used for the analysed time portionof the audio signal.

FIGS. 1 and 2 do not show many of these details for the encoder/decoder.Possible detailed examples for the encoder/decoder are shown in thefollowing figures. In addition to the first and second decoder 110 a,bof FIG. 1 further decoders may be present which may or may not use e.g.further encoding algorithms. In the same way, also the encoder 200 ofFIG. 2 may comprise additional encoders which may use additionalencoding algorithms. In the following the example with twoencoders/decoders will be explained in more detail.

FIG. 3 illustrates in more details an encoder having two cascadedswitches. A mono signal, a stereo signal or a multi-channel signal isinput into a decision stage 220 and into a switch 232 which is part ofthe BWE module 230 of FIG. 2. The switch 232 is controlled by thedecision stage 220. Alternatively, the decision stage 220 may alsoreceive a side information which is included in the mono signal, thestereo signal or the multi-channel signal or is at least associated tosuch a signal, where information is existing, which was, for example,generated when originally producing the mono signal, the stereo signalor the multi-channel signal.

The decision stage 220 actuates the switch 232 in order to feed a signaleither in a frequency encoding portion 210 b illustrated now at an upperbranch of FIG. 3 or an LPC-domain encoding portion 210 a illustrated ata lower branch in FIG. 3. A key element of the frequency domain encodingbranch is a spectral conversion block 410 which is operative to converta common preprocessing stage output signal (as discussed later on) intoa spectral domain. The spectral conversion block may include an MDCTalgorithm, a QMF, an FFT algorithm, a Wavelet analysis or a filterbanksuch as a critically sampled filterbank having a certain number offilterbank channels, where the subband signals in this filterbank may bereal valued signals or complex valued signals. The output of thespectral conversion block 410 is encoded using a spectral audio encoder421 which may include processing blocks as known from the AAC codingscheme.

Generally, the processing in branch 210 b is a processing based on aperception based model or information sink model. Thus, this branchmodels the human auditory system receiving sound. Contrary thereto, theprocessing in branch 210 a is to generate a signal in the excitation,residual or LPC domain. Generally, the processing in branch 210 a is aprocessing based on a speech model or an information generation model.For speech signals, this model is a model of the human speech/soundgeneration system generating sound. If, however, a sound from adifferent source requiring a different sound generation model is to beencoded, then the processing in branch 210 a may be different. Inaddition to the shown coding branches, further embodiments compriseadditional branches or core coders. For example, different coders mayoptionally be present for the different sources, so that sound from eachsource may be coded by employing an advantageous coder.

In the lower encoding branch 210 a, a key element is an LPC device 510which outputs LPC information which is used for controlling thecharacteristics of an LPC filter. This LPC information is transmitted toa decoder. The LPC stage 510 output signal is an LPC-domain signal whichconsists of an excitation signal and/or a weighted signal.

The LPC device generally outputs an LPC domain signal which can be anysignal in the LPC domain or any other signal which has been generated byapplying LPC filter coefficients to an audio signal. Furthermore, an LPCdevice can also determine these coefficients and can alsoquantize/encode these coefficients.

The decision in the decision stage 220 can be signal-adaptive so thatthe decision stage performs a music/speech discrimination and controlsthe switch 232 in such a way that music signals are input into the upperbranch 210 b, and speech signals are input into the lower branch 210 a.In one embodiment, the decision stage 220 is feeding its decisioninformation into an output bit stream so that a decoder can use thisdecision information in order to perform the correct decodingoperations. This decision information may, for example, comprise thecoding mode information 108 which may also comprise information aboutthe crossover frequency fx or a change of the crossover frequency fx.

Such a decoder is illustrated in FIG. 4. The signal output of thespectral audio encoder 421 is, after transmission, input into a spectralaudio decoder 431. The output of the spectral audio decoder 431 is inputinto a time-domain converter 440 (the time-domain converter may ingeneral be a converter from a first to a second domain). Analogously,the output of the LPC domain encoding branch 210 a of FIG. 3 received onthe decoder side and processed by elements 531, 533, 534, and 532 forobtaining an LPC excitation signal. The LPC excitation signal is inputinto an LPC synthesis stage 540 which receives, as a further input, theLPC information generated by the corresponding LPC analysis stage 510.The output of the time-domain converter 440 and/or the output of the LPCsynthesis stage 540 are input into a switch 132 which may be part of theBWE module 130 in FIG. 1. The switch 132 is controlled via a switchcontrol signal (such as the coding mode information 108 and/or the BWEparameter 106) which was, for example, generated by the decision stage220, or which was externally provided such as by a creator of theoriginal mono signal, stereo signal or multi-channel signal.

In FIG. 3, the input signal into the switch 232 and the decision stage220 can be a mono signal, a stereo signal, a multi-channel signal orgenerally any audio signal.

Depending on the decision which can be derived from the switch 232 inputsignal or from any external source such as a producer of the originalaudio signal underlying the signal input into stage 232, the switchswitches between the frequency encoding branch 210 b and the LPCencoding branch 210 a. The frequency encoding branch 210 b comprises aspectral conversion stage 410 and a subsequently connectedquantizing/coding stage 421. The quantizing/coding stage can include anyof the functionalities as known from modern frequency-domain encoderssuch as the AAC encoder. Furthermore, the quantization operation in thequantizing/coding stage 421 can be controlled via a psychoacousticmodule which generates psychoacoustic information such as apsychoacoustic masking threshold over the frequency, where thisinformation is input into the stage 421.

In the LPC encoding branch 210 a, the switch output signal is processedvia an LPC analysis stage 510 generating LPC side info and an LPC-domainsignal. The excitation encoder may comprise an additional switch forswitching the further processing of the LPC-domain signal between aquantization/coding operation 522 in the LPC-domain or aquantization/coding stage 524 which is processing values in theLPC-spectral domain. To this end, a spectral converter 523 is providedat the input of the quantizing/coding stage 524. The switch 521 iscontrolled in an open loop fashion or a closed loop fashion depending onspecific settings as, for example, described in the AMR-WB+ technicalspecification.

For the closed loop control mode, the encoder additionally includes aninverse quantizer/coder 531 for the LPC domain signal, an inversequantizer/coder 533 for the LPC spectral domain signal and an inversespectral converter 534 for the output of item 533. Both encoded andagain decoded signals in the processing branches of the second encodingbranch are input into the switch control device 525. In the switchcontrol device 525, these two output signals are compared to each otherand/or to a target function or a target function is calculated which maybe based on a comparison of the distortion in both signals so that thesignal having the lower distortion is used for deciding, which positionthe switch 521 should take. Alternatively, in case both branches providenon-constant bit rates, the branch providing the lower bit rate might beselected even when the distortion or the perceptional distortion of thisbranch is lower than the distortion or perceptional distortion of theother branch (an example for the distortion may be the signal to noiseratio). Alternatively, the target function could use, as an input, thedistortion of each signal and a bit rate of each signal and/oradditional criteria in order to find the best decision for a specificgoal. If, for example, the goal is such that the bit rate should be aslow as possible, then the target function would heavily rely on the bitrate of the two signals output of the elements 531, 534. However, whenthe main goal is to have the best quality for a certain bit rate, thenthe switch control 525 might, for example, discard each signal which isabove the allowed bit rate and when both signals are below the allowedbit rate, the switch control would select the signal having the betterestimated subjective quality, i.e., having the smallerquantization/coding distortions or a better signal to noise ratio.

The decoding scheme in accordance with an embodiment is, as statedbefore, illustrated in FIG. 4. For each of the three possible outputsignal kinds, a specific decoding/re-quantizing stage 431, 531 or 533exists. While stage 431 outputs a frequency-spectrum which is convertedinto the time-domain using the frequency/time converter 440, stage 531outputs an LPC-domain signal, and item 533 outputs an LPC-spectrum. Inorder to make sure that the input signals into switch 532 are both inthe LPC-domain, the LPC-spectrum/LPC-converter 534 is provided. Theoutput data of the switch 532 is transformed back into the time-domainusing an LPC synthesis stage 540 which is controlled via encoder-sidegenerated and transmitted LPC information. Then, subsequent to block540, both branches have time-domain information which is switched inaccordance with a switch control signal in order to finally obtain anaudio signal such as a mono signal, a stereo signal or a multi-channelsignal which depends on the signal input into the encoding scheme ofFIG. 3.

FIGS. 5 and 6 show further embodiments for the encoder/decoder, whereinthe BWE stages as part of the BWE modules 130, 230 represent a commonprocessing unit.

FIG. 5 illustrates an encoding scheme, wherein the common preprocessingscheme connected to the switch 232 input may comprise a surround/jointstereo block 101 which generates, as an output, joint stereo parametersand a mono output signal which is generated by downmixing the inputsignal which is a signal having two or more channels. Generally, thesignal at the output of block 101 can also be a signal having morechannels, but due to the downmixing functionality of block 101, thenumber of channels at the output of block 101 will be smaller than thenumber of channels input into block 101.

The common preprocessing scheme may comprise in addition to the block101 a bandwidth extension stage 230. In the FIG. 5 embodiment, theoutput of block 101 is input into the bandwidth extension block 230which outputs a band-limited signal such as the low band signal or thelow pass signal at its output. Advantageously, this signal isdownsampled (e.g. by a factor of two) as well. Furthermore, for the highband of the signal input into block 230, bandwidth extension parameters106 such as spectral envelope parameters, inverse filtering parameters,noise floor parameters etc. as known from HE-AAC profile of MPEG-4 aregenerated and forwarded to a bitstream multiplexer 800.

Advantageously, the decision stage 220 receives the signal input intoblock 101 or input into block 230 in order to decide between, forexample, a music mode or a speech mode. In the music mode, the upperencoding branch 210 b (second encoder in FIG. 2) is selected, while, inthe speech mode, the lower encoding branch 210 a is selected.Advantageously, the decision stage additionally controls the jointstereo block 101 and/or the bandwidth extension block 230 to adapt thefunctionality of these blocks to the specific signal. Thus, when thedecision stage 220 determines that a certain time portion of the inputsignal corresponds to the first mode such as the music mode, thenspecific features of block 101 and/or block 230 can be controlled by thedecision stage 220. Alternatively, when the decision stage 220determines that the signal corresponds to a speech mode or, generally,in a second LPC-domain mode, then specific features of blocks 101 and230 can be controlled in accordance with the decision stage output. Thedecision stage 220 yields also the control information 108 and/or thecrossover frequency fx which may also be transmitted to the BWE block230 and, in addition, to a bitstream multiplexer 800 so that it will betransmitted to the decoder side.

Advantageously, the spectral conversion of the coding branch 210 b isdone using an MDCT operation which, even more advantageously, is thetime-warped MDCT operation, where the strength or, generally, thewarping strength can be controlled between zero and a high warpingstrength. In a zero warping strength, the MDCT operation in block 411 isa straight-forward MDCT operation known in the art. The time warpingstrength together with time warping side information can betransmitted/input into the bitstream multiplexer 800 as sideinformation.

In the LPC encoding branch, the LPC-domain encoder may include an ACELPcore 526 calculating a pitch gain, a pitch lag and/or codebookinformation such as a codebook index and gain. The TCX mode as knownfrom 3GPP TS 26.290 includes a processing of a perceptually weightedsignal in the transform domain. A Fourier transformed weighted signal isquantized using a split multi-rate lattice quantization (algebraic VQ)with noise factor quantization. A transform is calculated in 1024, 512,or 256 sample windows. The excitation signal is recovered by inversefiltering the quantized weighted signal through an inverse weightingfilter. The TCX mode may also be used in modified form in which the MDCTis used with an enlarged overlap, scalar quantization, and an arithmeticcoder for encoding spectral lines.

In the “music” coding branch 210 b, a spectral converter advantageouslycomprises a specifically adapted MDCT operation having certain windowfunctions followed by a quantization/entropy encoding stage which mayconsist of a single vector quantization stage, but advantageously is acombined scalar quantizer/entropy coder similar to the quantizer/coderin the frequency domain coding branch, i.e., in item 421 of FIG. 5.

In the “speech” coding branch 210 a, there is the LPC block 510 followedby a switch 521, again followed by an ACELP block 526 or a TCX block527. ACELP is described in 3GPP TS 26.190 and TCX is described in 3GPPTS 26.290. Generally, the ACELP block 526 receives an LPC excitationsignal as calculated by a procedure as described in FIG. 7. The TCXblock 527 receives a weighted signal as generated by FIG. 8.

At the decoder side illustrated in FIG. 6, after the inverse spectraltransform in block 537, the inverse of the weighting filter is appliedthat is (1−μz⁻¹)/(1−A(z/γ)). Then, the signal is filtered through(1−A(z)) to go to the LPC excitation domain. Thus, the conversion to LPCdomain block 534 and the TCX⁻¹ block 537 include inverse transform andthen filtering through

$\frac{\left( {1 - {\mu \; z^{- 1}}} \right)}{\left( {1 - {A\left( {z/\gamma} \right)}} \right)}\left( {1 - {A(z)}} \right)$

to convert from the weighted domain to the excitation domain.

Although item 510 in FIGS. 3, 5 illustrates a single block, block 510can output different signals as long as these signals are in the LPCdomain. The actual mode of block 510 such as the excitation signal modeor the weighted signal mode can depend on the actual switch state.Alternatively, the block 510 can have two parallel processing devices,where one device is implemented similar to FIG. 7 and the other deviceis implemented as FIG. 8. Hence, the LPC domain at the output of 510 canrepresent either the LPC excitation signal or the LPC weighted signal orany other LPC domain signal.

In the second encoding branch (ACELP/TCX) of FIG. 5, the signal isadvantageously pre-emphasized through a filter 1−μz⁻¹ before encoding.At the ACELP/TCX decoder in FIG. 6 the synthesized signal isdeemphasized with the filter 1/(1−μz⁻). In an advantageous embodiment,the parameter μ has the value 0.68. The preemphasis can be part of theLPC block 510 where the signal is preemphasized before LPC analysis andquantization. Similarly, deemphasis can be part of the LPC synthesisblock LPC⁻¹ 540.

FIG. 6 illustrates a decoding scheme corresponding to the encodingscheme of FIG. 5. The bitstream generated by bitstream multiplexer 800(or output interface) of FIG. 5 is input into a bitstream demultiplexer900 (or input interface). Depending on an information derived forexample from the bitstream via a mode detection block 601 (e.g. part ofthe controller 140 in FIG. 1), a decoder-side switch 132 is controlledto either forward signals from the upper branch or signals from thelower branch to the bandwidth extension block 701. The bandwidthextension block 701 receives, from the bitstream demultiplexer 900, sideinformation and, based on this side information and the output of themode detection 601, reconstructs the high band based on the low bandoutput by switch 132. The control signal 108 controls the used crossoverfrequency fx.

The full band signal generated by block 701 is input into the jointstereo/surround processing stage 702 which reconstructs two stereochannels or several multi-channels. Generally, block 702 will outputmore channels than were input into this block. Depending on theapplication, the input into block 702 may even include two channels suchas in a stereo mode and may even include more channels as long as theoutput of this block has more channels than the input into this block.

The switch 232 in FIG. 5 has been shown to switch between both branchesso that only one branch receives a signal to process and the otherbranch does not receive a signal to process. In an alternativeembodiment, however, the switch 232 may also be arranged subsequent tofor example the audio encoder 421 and the excitation encoder 522, 523,524, which means that both branches 210 a, 210 b process the same signalin parallel. In order to not double the bitrate, however, only thesignal output of one of those encoding branches 210 a or 210 b isselected to be written into the output bitstream. The decision stagewill then operate so that the signal written into the bitstreamminimizes a certain cost function, where the cost function can be thegenerated bitrate or the generated perceptual distortion or a combinedrate/distortion cost function. Therefore, either in this mode or in themode illustrated in the Figures, the decision stage can also operate ina closed loop mode in order to make sure that, finally, only theencoding branch output is written into the bitstream which has for agiven perceptual distortion the lowest bitrate or, for a given bitrate,has the lowest perceptual distortion. In the closed loop mode, thefeedback input may be derived from outputs of the three quantizer/scalerblocks 421, 522 and 424 in FIG. 3.

Also in the embodiment of FIG. 6, the switch 132 may in alternativeembodiments be arranged after the BWE module 701 so that the bandwidthextension is performed in parallel for both branches and the switchselects one of the two bandwidth extended signals.

In the implementation having two switches, i.e., the first switch 232and the second switch 521, it is advantageous that the time resolutionfor the first switch is lower than the time resolution for the secondswitch. Stated differently, the blocks of the input signal into thefirst switch which can be switched via a switch operation are largerthan the blocks switched by the second switch 521 operating in theLPC-domain. Exemplarily, the frequency domain/LPC-domain switch 232 mayswitch blocks of a length of 1024 samples, and the second switch 521 canswitch blocks having 256 samples each.

FIG. 7 illustrates a more detailed implementation of the LPC analysisblock 510. The audio signal is input into a filter determination block83 which determines the filter information A(z). This information isoutput as the short-term prediction information that may be used for adecoder. The short-term prediction information that may be used by theactual prediction filter 85. In a subtracter 86, a current sample of theaudio signal is input and a predicted value for the current sample issubtracted so that for this sample, the prediction error signal isgenerated at line 84.

While FIG. 7 illustrates an advantageous way to calculate the excitationsignal, FIG. 8 illustrates an advantageous way to calculate the weightedsignal. In contrast to FIG. 7, the filter 85 is different, when γ isdifferent from 1. A value smaller than 1 is advantageous for γ.Furthermore, the block 87 is present, and μ is advantageous a numbersmaller than 1. Generally, the elements in FIGS. 7 and 8 can beimplemented as in 3GPP TS 26.190 or 3GPP TS 26.290.

Subsequently, an analysis-by-synthesis CELP encoder is discussed inorder to illustrate the modifications applied to this algorithm. ThisCELP encoder is discussed in detail in “Speech Coding: A TutorialReview”, Andreas Spanias, Proceedings of the IEEE, Vol. 82, No. 10,October 1994, pages 1541-1582.

For specific cases, when a frame is a mixture of unvoiced and voicedspeech or when speech over music occurs, a TCX coding can be moreappropriate to code the excitation in the LPC domain. The TCX codingprocesses directly the excitation in the frequency domain without doingany assumption of excitation production. The TCX is then more genericthan CELP coding and is not restricted to a voiced or a non-voicedsource model of the excitation. TCX is still a source-filter modelcoding using a linear predictive filter for modelling the formants ofthe speech-like signals.

In the AMR-WB+-like coding, a selection between different TCX modes andACELP takes place as known from the AMR-WB+description. The TCX modesare different in that the length of the block-wise Fast FourierTransform is different for different modes and the best mode can beselected by an analysis by synthesis approach or by a direct“feedforward” mode.

As discussed in connection with FIGS. 5 and 6, the common pre-processingstage 100 advantageously includes a joint multi-channel (surround/jointstereo device) 101 and, additionally, a bandwidth extension stage 230.Correspondingly, the decoder includes a bandwidth extension stage 701and a subsequently connected joint multichannel stage 702.Advantageously, the joint multichannel stage 101 is, with respect to theencoder, connected before the band width extension stage 230, and, onthe decoder side, the band width extension stage 701 is connected beforethe joint multichannel stage 702 with respect to the signal processingdirection. Alternatively, however, the common pre-processing stage caninclude a joint multichannel stage without the subsequently connectedbandwidth extension stage or a bandwidth extension stage without aconnected joint multichannel stage.

FIGS. 9 a to 9 b show a simplified view on the encoder of FIG. 5, wherethe encoder comprises the switch-decision unit 220 and the stereo codingunit 101. In addition, the encoder also comprises the bandwidthextension tools 230 as, for example, an envelope data calculator andSBR-related modules. The switch-decision unit 220 provides a switchdecision signal 108′ that switches between the audio coder 210 b and thespeech coder 210 a. The speech coder 210 a may further be divided into avoiced and unvoiced coder. Each of these coders may encode the audiosignal in the core frequency band using different numbers of samplevalues (e.g. 1024 for a higher resolution or 256 for a lowerresolution). The switch decision signal 108′ is also supplied to thebandwidth extension (BWE) tool 230. The BWE tool 230 will then use theswitch decision 108′ in order, for example, to adjust the number of thespectral envelopes 104 and to turn on/off an optional transient detectorand adjust the crossover frequency fx. The audio signal 105 is inputinto the switch-decision unit 220 and is input into the stereo coding101 so that the stereo coding 101 may produce the sample values whichare input into the bandwidth extension unit 230. Depending on thedecision 108′ generated by the switch-unit decision unit 220, thebandwidth extension tool 230 will generate spectral band replicationdata which are, in turn, forwarded either to an audio coder 210 b or aspeech coder 210 a.

The switch decision signal 108′ is signal dependent and can be obtainedfrom the switch-decision unit 220 by analyzing the audio signal, e.g.,by using a transient detector or other detectors which may or may notcomprise a variable threshold. Alternatively, the switch decision signal108′ may be adjusted manually (e.g. by a user) or be obtained from adata stream (included in the audio signal).

The output of the audio coder 210 b and the speech coder 210 a may againbe input into the bitstream formatter 800 (see FIG. 5).

FIG. 9 b shows an example for the switch decision signal 108′ whichdetects an audio signal for a time period before a first time ta andafter a second time tb. Between the first time ta and the second timetb, the switch-decision unit 220 detects a speech signal resulting indifferent discrete values for the switch decision signal 108′.

The decision to use a higher crossover frequency fx is controlled by theswitching decision unit 220. This means that the described method isalso usable within a system in which the SBR module is combined withonly a single core coder and a variable crossover frequency fx.

Although some of the FIGS. 1 through 9 are illustrated as block diagramsof an apparatus, these figures simultaneously are an illustration of amethod, where the block functionalities correspond to the method steps.

FIG. 10 illustrates a representation for an encoded audio signal 102comprising the first portion 104 a, the second portion 104 b, a thirdportion 104 c and a fourth portion 104 d. In this representation theencoded audio signal 102 is a bitstream transmitted over a transmissionchannel which comprises furthermore the coding mode information 108.Each portion 104 of the encoded audio signal 102 may represent adifferent time portion, although different portions 104 may be in thefrequency as well as time domain so that the encoded audio signal 102may not represent a time line.

In this embodiment the encoded audio signal 102 comprises in addition afirst coding mode information 108 a identifying the used codingalgorithm for the first portion 104 a; a second coding mode information108 b identifying the used coding algorithm for the second portion 104b; a third coding mode information 108 d identifying the used codingalgorithm for the fourth portion 104 d. The first coding modeinformation 108 a may also identify the used first crossover frequencyfx1 within the first portion 104 a, and the second coding modeinformation 108 b may also identify the used second crossover frequencyfx2 within the second portion 104 b. For example, within the firstportion 104 a the “speech” coding mode may be used and within the secondportion 104 b the “music” coding mode may be used so that the firstcrossover frequency fx1 may be higher than the second crossoverfrequency fx2.

In this exemplary embodiment the encoded audio signal 102 comprises nocoding mode information for the third portion 104 c which indicates thatthere is no change in the used encoder and/or crossover frequency fxbetween the first and third portion 104 a, c. Therefore, the coding modeinformation 108 may appear as header only for those portions 104 whichuse a different core coder and/or crossover frequency compared to thepreceding portion. In further embodiments instead of signaling thevalues of the crossover frequencies for the different portions 104, thecode mode information 108 may comprise a single bit indicating the corecoder (first or second encoder 210 a,b) used for the respective portion104.

Therefore, the signaling of the switch behavior between the differentSBR-tools can be done by submitting, for example, as specific bit withinthe bitstream, so that this specific bit may turn on or off a specificbehavior in the decoder. Alternatively, in systems with two core codersaccording to embodiments the signaling of the switch may also beinitiated by analyzing the core codec. In this case the submission ofthe adaptation of the SBR tools is done implicitly, that means it isdetermined by the corresponding core coder activity.

More details about the standard description of the bitstream elementsfor the SBR payload can be found in ISO/IEC 14496-3, sub-clause 4.5.2.8.A modification of this standard bitstream comprises an extension of theindex to the master frequency table (to identify the used crossoverfrequency). The used index is coded, for example, with four bitsallowing the crossover band to be variable over a range of 0 to 15bands.

Embodiments of the present invention can hence be summarized as follows.Different signals with different time/frequency characteristics havedifferent demands on the characteristic on the bandwidth extension.Transient signals (e.g. within a speech signal) need a fine temporalresolution of the BWE and the crossover frequency fx (the upperfrequency border of the core coder) should be as high as possible (e.g.4 kHz or 5 kHz or 6 kHz). Especially in voiced speech, a distortedtemporal structure can decrease perceived quality. Tonal signals need astable reproduction of spectral components and a matching harmonicpattern of the reproduced high frequency portions. The stablereproduction of tonal parts limits the core coder bandwidth but it doesnot need a BWE with fine temporal but finer spectral resolution. In aswitched speech-/audio core coder design, it is possible to use the corecoder decision also to adapt both the temporal and spectralcharacteristics of the BWE as well as adapting the BWE start frequency(crossover frequency) to the signal characteristics. Therefore,embodiments provide a bandwidth extension where the core coder decisionacts as adaptation criterion to bandwidth extension characteristics.

The signaling of the changed BWE start (crossover) frequency can berealized explicitly by sending additional information (as, for example,the coding mode information 108) in the bitstream or implicitly byderiving the crossover frequency fx directly from the core coder used(in case the core coder is, e.g., signaled within the bitstream). Forexample, a lower BWE frequency fx for the transform coder (for exampleaudio/music coder) and a higher for a time domain (speech) coder. Inthis case, the crossover frequency may be in the range between 0 Hz upto the Nyquist frequency.

Although some aspects have been described in the context of anapparatus, it is clear that these aspects also represent a descriptionof the corresponding method, where a block or device corresponds to amethod step or a feature of a method step. Analogously, aspectsdescribed in the context of a method step also represent a descriptionof a corresponding block or item or feature of a correspondingapparatus.

The inventive encoded audio signal can be stored on a digital storagemedium or can be transmitted on a transmission medium such as a wirelesstransmission medium or a wired transmission medium such as the Internet.

Depending on certain implementation requirements, embodiments of theinvention can be implemented in hardware or in software. Theimplementation can be performed using a digital storage medium, forexample a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROMor a FLASH memory, having electronically readable control signals storedthereon, which cooperate (or are capable of cooperating) with aprogrammable computer system such that the respective method isperformed.

Some embodiments according to the invention comprise a data carrierhaving electronically readable control signals, which are capable ofcooperating with a programmable computer system, such that one of themethods described herein is performed.

Generally, embodiments of the present invention can be implemented as acomputer program product with a program code, the program code beingoperative for performing one of the methods when the computer programproduct runs on a computer. The program code may for example be storedon a machine readable carrier.

Other embodiments comprise the computer program for performing one ofthe methods described herein, stored on a machine readable carrier.

In other words, an embodiment of the inventive method is, therefore, acomputer program having a program code for performing one of the methodsdescribed herein, when the computer program runs on a computer.

A further embodiment of the inventive methods is, therefore, a datacarrier (or a digital storage medium, or a computer-readable medium)comprising, recorded thereon, the computer program for performing one ofthe methods described herein.

A further embodiment of the inventive method is, therefore, a datastream or a sequence of signals representing the computer program forperforming one of the methods described herein. The data stream or thesequence of signals may for example be configured to be transferred viaa data communication connection, for example via the Internet.

A further embodiment comprises a processing means, for example acomputer, or a programmable logic device, configured to or adapted toperform one of the methods described herein.

A further embodiment comprises a computer having installed thereon thecomputer program for performing one of the methods described herein.

In some embodiments, a programmable logic device (for example a fieldprogrammable gate array) may be used to perform some or all of thefunctionalities of the methods described herein. In some embodiments, afield programmable gate array may cooperate with a microprocessor inorder to perform one of the methods described herein. Generally, themethods are advantageously performed by any hardware apparatus.

While this invention has been described in terms of several embodiments,there are alterations, permutations, and equivalents which fall withinthe scope of this invention. It should also be noted that there are manyalternative ways of implementing the methods and compositions of thepresent invention. It is therefore intended that the following appendedclaims be interpreted as including all such alterations, permutationsand equivalents as fall within the true spirit and scope of the presentinvention.

1. An apparatus for decoding an encoded audio signal, the encoded audiosignal comprising a first portion encoded in accordance with a firstencoding algorithm, a second portion encoded in accordance with a secondencoding algorithm, BWE parameters for the first portion and the secondportion and a coding mode information indicating a first decodingalgorithm or a second decoding algorithm, comprising: a first decoderfor decoding the first portion in accordance with the first decodingalgorithm for a first time portion of the encoded signal to acquire afirst decoded signal, wherein the first decoder comprises an LPC-basedcoder; a second decoder for decoding the second portion in accordancewith the second decoding algorithm for a second time portion of theencoded signal to acquire a second decoded signal, wherein the seconddecoder comprises a transform-based coder; a BWE module comprising acontrollable crossover frequency, the BWE module being configured forperforming a bandwidth extension algorithm using the first decodedsignal and the BWE parameters for the first portion, and for performinga bandwidth extension algorithm using the second decoded signal and thebandwidth extension parameter for the second portion, wherein the BWEmodule is configured to use a first crossover frequency for thebandwidth extension for the first decoded signal and to use a secondcrossover frequency for the bandwidth extension for the second decodedsignal, wherein the first crossover frequency is higher than the secondcrossover frequency; and a controller for controlling the crossoverfrequency for the BWE module in accordance with the coding modeinformation.
 2. The apparatus for decoding of claim 1, furthercomprising an input interface for inputting the encoded audio signal asa bitstream.
 3. The apparatus for decoding of claim 1, wherein the BWEmodule comprises a switch which is configured to switch between thefirst and second time portion from the first decoder to the seconddecoder so that the bandwidth extension algorithm is either applied tothe first decoded signal or to the second decoded signal.
 4. Theapparatus for decoding of claim 3, wherein the controller is configuredto control the switch dependent on the indicated decoding algorithmwithin the coding mode information.
 5. The apparatus for decoding ofclaim 1, wherein the controller is configured to increase the crossoverfrequency within the first time portion or to decrease the crossoverfrequency within the second time portion.
 6. An apparatus for encodingan audio signal comprising: a first encoder which is configured toencode in accordance with a first encoding algorithm, the first encodingalgorithm comprising a first frequency bandwidth, wherein the firstencoder comprises an LPC-based coder; a second encoder which isconfigured to encode in accordance with a second encoding algorithm, thesecond encoding algorithm comprising a second frequency bandwidth beingsmaller than the first frequency bandwidth, wherein the second encodercomprises a transform-based coder; a decision stage for indicating thefirst encoding algorithm for a first portion of the audio signal and forindicating the second encoding algorithm for a second portion of theaudio signal, the second portion being different from the first portion;and a bandwidth extension module for calculating BWE parameters for theaudio signal, wherein the BWE module is configured to be controlled bythe decision stage to calculate the BWE parameters for a band notcomprising the first frequency bandwidth in the first portion of theaudio signal and for a band not comprising the second frequencybandwidth in the second portion of the audio signal, wherein the firstor the second frequency bandwidth is defined by a variable crossoverfrequency and wherein the decision stage is configured to output thevariable crossover frequency, wherein the BWE module is configured touse a first crossover frequency for calculating the BWE parameters for asignal encoded using the first encoder and to use a second crossoverfrequency for a signal encoded using the second encoder, wherein thefirst crossover frequency is higher than the second crossover frequency.7. The apparatus for encoding of claim 6, further comprising an outputinterface for outputting the encoded audio signal, the encoded audiosignal comprising a first portion encoded in accordance with a firstencoding algorithm, a second portion encoded in accordance with a secondencoding algorithm, BWE parameters for the first portion and the secondportion and coding mode information indicating the first decodingalgorithm or the second decoding algorithm.
 8. The apparatus forencoding of claim 6, wherein the first or the second frequency bandwidthis defined by a variable crossover frequency and wherein the decisionstage is configured to output the variable crossover frequency.
 9. Theapparatus for encoding of claim 6, wherein the BWE module comprises aswitch controlled by the decision stage, wherein the switch isconfigured to switch between the first and second time encoder so thatthe audio signal is for different time portions either encoded by thefirst or by the second encoder.
 10. The apparatus for encoding of claim6, wherein the decision stage is operative to analyze the audio signalor a first output of the first encoder or a second output of the secondencoder or a signal acquired by decoding an output signal of the firstencoder or the second encoder with respect to a target function.
 11. Amethod for decoding an encoded audio signal, the encoded audio signalcomprising a first portion encoded in accordance with a first encodingalgorithm, a second portion encoded in accordance with a second encodingalgorithm, BWE parameters for the first portion and the second portionand a coding mode information indicating a first decoding algorithm or asecond decoding algorithm, the method comprising: decoding the firstportion in accordance with the first decoding algorithm for a first timeportion of the encoded signal to acquire a first decoded signal, whereindecoding the first portion comprises using an LPC-based coder; decodingthe second portion in accordance with the second decoding algorithm fora second time portion of the encoded signal to acquire a second decodedsignal, wherein decoding the second portion comprises using atransform-based coder; performing a bandwidth extension algorithm by aBWE module comprising a controllable crossover frequency, using thefirst decoded signal and the BWE parameters for the first portion, andperforming, by the BWE module comprising the controllable crossoverfrequency, a bandwidth extension algorithm using the second decodedsignal and the bandwidth extension parameter for the second portion,wherein a first crossover frequency is used for the bandwidth extensionfor the first decoded signal and a second crossover frequency is usedfor the bandwidth extension for the second decoded signal, wherein thefirst crossover frequency is higher than the second crossover frequency;and controlling the crossover frequency for the BWE module in accordancewith the coding mode information.
 12. A method for encoding an audiosignal comprising: encoding in accordance with a first encodingalgorithm, the first encoding algorithm comprising a first frequencybandwidth, wherein encoding in accordance with a first encodingalgorithm comprises using an LPC-based coder; encoding in accordancewith a second encoding algorithm, the second encoding algorithmcomprising a second frequency bandwidth being smaller than the firstfrequency bandwidth, wherein encoding in accordance with a secondencoding algorithm comprises using a transform-based coder; indicatingthe first encoding algorithm for a first portion of the audio signal andthe second encoding algorithm for a second portion of the audio signal,the second portion being different from the first portion; andcalculating BWE parameters for the audio signal such that the BWEparameters are calculated for a band not comprising the first frequencybandwidth in the first portion of the audio signal and for a band notcomprising the second frequency bandwidth in the second portion of theaudio signal, wherein the first or the second frequency bandwidth isdefined by a variable crossover frequency, wherein the BWE module isconfigured to use a first crossover frequency for calculating the BWEparameters for a signal encoded using the LPC-based coder and to use asecond crossover frequency for a signal encoded using thetransform-based coder, wherein the first crossover frequency is higherthan the second crossover frequency.
 13. An encoded audio signalcomprising: a first portion encoded in accordance with a first encodingalgorithm, the first encoding algorithm comprising an LPC-based coder; asecond portion encoded in accordance with a second different encodingalgorithm, the second encoding algorithm comprising a transform-basedcoder; bandwidth extension parameters for the first portion and thesecond portion; and a coding mode information indicating a firstcrossover frequency used for the first portion or a second crossoverfrequency used for the second portion, wherein the first crossoverfrequency is higher than the second crossover frequency.
 14. A computerprogram for performing, when running on a computer, the method forencoding an audio signal, said method comprising: encoding in accordancewith a first encoding algorithm, the first encoding algorithm comprisinga first frequency bandwidth, wherein encoding in accordance with a firstencoding algorithm comprises using an LPC-based coder; encoding inaccordance with a second encoding algorithm, the second encodingalgorithm comprising a second frequency bandwidth being smaller than thefirst frequency bandwidth, wherein encoding in accordance with a secondencoding algorithm comprises using a transform-based coder; indicatingthe first encoding algorithm for a first portion of the audio signal andthe second encoding algorithm for a second portion of the audio signal,the second portion being different from the first portion; andcalculating BWE parameters for the audio signal such that the BWEparameters are calculated for a band not comprising the first frequencybandwidth in the first portion of the audio signal and for a band notcomprising the second frequency bandwidth in the second portion of theaudio signal, wherein the first or the second frequency bandwidth isdefined by a variable crossover frequency, wherein the BWE module isconfigured to use a first crossover frequency for calculating the BWEparameters for a signal encoded using the LPC-based coder and to use asecond crossover frequency for a signal encoded using thetransform-based coder, wherein the first crossover frequency is higherthan the second crossover frequency.